Danny Amor's Online Library

home *** CD-ROM | disk | FTP | other *** search

/ Danny Amor's Online Library / Danny Amor's Online Library - Volume 1.iso / bbs / society / society.lha / PUB / isoc_news / 1-3 / n-1-3-040.31.1a < prev next >

Wrap

Text File | 1995-07-21 | 5KB | 85 lines

N-1-3-040.31.1, Attribute Distribution and Search for Internet Resource Discovery, by Michael F. Schwartz*, <schwartz@latour.cs.colorado.edu> In the previous issue of Internet Society News, I pointed out that resource discovery involves two basic problems: characterizing the resources of interest using name/attribute descriptions, and distributing this information so it can be searched flexibly and efficiently. I also discussed a number of approaches to the characterization problem. The current article considers attribute distribution and search. The most straightforward solution to the distribution/search problem is to centralize resource information. This approach is taken by archie, which stores anonymous FTP directory listings on a central server. WAIS uses a centralized server to maintain a directory of WAIS servers. To date, centralized information has worked quite well in archie and WAIS. Archie maintains information about nearly 1,000 Internet archive sites, and handles thousands of queries per day. There are hundreds of WAIS servers registered in the top-level directory, and new servers are added often. The problem with a centralized solution, of course, is that the central server can become a performance bottleneck and a critical point of failure, particularly as the scale of the system increases. The difficulty in sustaining reasonable response times in the face of tremendous popular demand for archie has moved the community to create replica servers. Doing so distributes the load, yet creates auxiliary problems of distributing the data and maintaining consistency between replicas. A future version of archie will address these problems using "lazy" update semantics to distribute data among replicas. To reduce the scalability and consistency problems of a fully replicated directory, one can chose a solution where only parts of the resource data are maintained on any particular server. A common approach is to impose some organizational properties on the data, and distribute data according to these properties. For example, the X.500 directory service standard divides information hierarchically. The tree is divided by country at the top level, and by administrative organization (company, university, etc.) at the next level down. Since the information in a hierarchy can be divided into arbitrarily many pieces, hierarchical directories scale well. Yet, it is only efficient to search hierarchical information according to the one way it is organized. For example, in X.500 it is efficient to find information about a person from a known country and organization, but it would be infeasible to find people according to their technical interests or other criteria that exist in the individual resource records, but that are not represented in the tree structure. One can mimic the effect of representing multiple search criteria in a hierarchy by maintaining separate structures with symbolic links to the "main" data, but searches still require expensive distributed operations. If, on the other hand, one does not support search operations, symbolic linking can provide an acceptable mechanism. For example, the Prospero file system uses symbolic links to provide views of information in anonymous FTP and other Internet file systems. Users can browse the information, but search operations are not supported. There are other ways to partially replicate resource attribute data beyond hierarchical distribution. One approach is to distribute information randomly among a set of servers, and cache the most popular information at each server. This approach requires that one sacrifice the ability to perform exhaustive searches. I experimented with one such protocol, optimized for locating a subset of the available copies of popular resources. Another approach is to construct records describing particular collections of information available in a structured information space (such as a hierarchical file system) and register these records into auxiliary indices that can be searched with "flat" search operations. For example, one could register particularly important/popular directories in a large file system into indices focused on particular technical topics, so that users can search for information about these topics without regard to the organization in the main file system. This approach is used by the perspective discovery paradigm discussed in the previous issue of this newsletter. *Assistant Professor, Department of Computer Science, University of Colorado - Boulder